A Fast and Reliable Policy Improvement Algorithm

نویسندگان

  • Yasin Abbasi-Yadkori
  • Peter L. Bartlett
  • Stephen J. Wright
چکیده

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coverage Improvement In Wireless Sensor Networks Based On Fuzzy-Logic And Genetic Algorithm

Wireless sensor networks have been widely considered as one of the most important 21th century technologies and are used in so many applications such as environmental monitoring, security and surveillance. Wireless sensor networks are used when it is not possible or convenient to supply signaling or power supply wires to a wireless sensor node. The wireless sensor node must be battery powered.C...

متن کامل

VRED: An improvement over RED algorithm by using queue length growth velocity

Active Queue Management (AQM) plays an important role in the Internet congestion control. It tries to enhance congestion control, and to achieve tradeoff between bottleneck utilization and delay. Random Early Detection (RED) is the most popular active queue management algorithm that has been implemented in the in Internet routers and is trying to supply low delay and low packet loss. RED al...

متن کامل

VRED: An improvement over RED algorithm by using queue length growth velocity

Active Queue Management (AQM) plays an important role in the Internet congestion control. It tries to enhance congestion control, and to achieve tradeoff between bottleneck utilization and delay. Random Early Detection (RED) is the most popular active queue management algorithm that has been implemented in the in Internet routers and is trying to supply low delay and low packet loss. RED al...

متن کامل

Diverse Exploration for Fast and Safe Policy Improvement

We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration ...

متن کامل

Improving Fast Charging Methods Using Genetic Algorithm and Coordination between Chargers in Fast Charging Station of Electric Vehicles in Order to Optimal Utilization of Power Capacity of Station

Fast charging stations are one of the most important section in smart grids with high penetration of electric vehicles. One of the important issues in fast chargers is choosing the proper method for charging. In this paper, by defining an optimization problem with the objective of reducing the charging time, the optimal charging levels are obtained using a multi-stage current method using a gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016